Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

No description provided.

@DaveCTurner DaveCTurner requested a review from a team as a code owner February 24, 2025 08:52
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Feb 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This improvement lead me to a lot of further questions 😅

Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some comments remain, but they're minor. Looks nice.

* performance (especially if using spinning disks).
* <p>
* Blocking on a future on this pool risks deadlock if there's a chance that the completion of the future depends on work being done
* on this pool. Unfortunately that's pretty likely in most cases because of how often this pool is used; it's really rare because
Copy link
Contributor

@DiannaHohensee DiannaHohensee Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made me wonder about the details of how a deadlock could occur, so that one can avoid it properly. Seemed to suggest Executor tasks are executed in order, of submission perhaps, so I was thinking maybe a task waiting for the future is put on the queue first. But that would mean a caller queues the future task before the task that it waits on. Is that really the scenario, or is there something else at play?

In short, if you could elaborate on the particulars of how the deadlock can happen, that would help readers not to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think execution order matters (although yes the queue is FIFO). It's just what it says: blocking generic threads on futures risks deadlock if the completion of those futures requires work to be done on the generic pool. There's no way to do that work if all the threads that could do it are blocked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for referencing UnsafePlainActionFuture 👍

* on this pool. Unfortunately that's pretty likely in most cases because of how often this pool is used; it's really rare because
* of the high limit on the pool size, but when it happens it is extremely harmful to the node.
* <p>
* This pool is for instance used for recovery-related work, which is a mix of CPU-bound and IO-bound work. The recovery subsystem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd bump this paragraph higher, swap it with the above deadlock paragraph. It looks like a natural continuation of the CPU/IO paragraphs discussions, and deadlocking doesn't participate in that line of reasoning (but currently interrupts it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it where it is but clarified that recovery also doesn't block on futures, so this now follows more logically from the preceding paragraphs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good 👍

Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

@DaveCTurner DaveCTurner added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport Automatically create backport pull requests when merged labels Mar 13, 2025
@elasticsearchmachine elasticsearchmachine merged commit fd7866a into elastic:main Mar 13, 2025
17 checks passed
@DaveCTurner DaveCTurner deleted the 2025/02/24/threadpool-docs-expansion branch March 13, 2025 20:35
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 13, 2025
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 13, 2025
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x
9.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Core/Infra/Core Core issues without another label >non-issue Team:Core/Infra Meta label for core/infra team v8.19.0 v9.0.1 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants